NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Large Language Models are Capable of Offering Cognitive Reappraisal, if Guided

Zhan, Hongli; Zheng, Allen; Lee, Yoon Kyung; Suh, Jina; Li, Junyi Jessy; Ong, Desmond C (October 2025, First Conference on Language Modeling (COLM 2024))

Large language models (LLMs) have offered new opportunities for emotional support, and recent work has shown that they can produce empathic responses to people in distress. However, long-term mental well-being requires emotional self-regulation, where a one-time empathic response falls short. This work takes a first step by engaging with cognitive reappraisals, a strategy from psychology practitioners that uses language to targetedly change negative appraisals that an individual makes of the situation; such appraisals is known to sit at the root of human emotional experience. We hypothesize that psychologically grounded principles could enable such advanced psychology capabilities in LLMs, and design RESORT which consists of a series of reappraisal constitutions across multiple dimensions that can be used as LLM instructions. We conduct a first-of-its-kind expert evaluation (by clinical psychologists with M.S. or Ph.D. degrees) of an LLM's zero-shot ability to generate cognitive reappraisal responses to medium-length social media messages asking for support. This fine-grained evaluation showed that even LLMs at the 7B scale guided by RESORT are capable of generating empathic responses that can help users reappraise their situations.
more » « less
Full Text Available
Behavioral Analysis of Information Salience in Large Language Models

https://doi.org/10.18653/v1/2025.findings-acl.1204

Trienes, Jan; Schlötterer, Jörg; Li, Junyi Jessy; Seifert, Christin (July 2025, Findings of the Association for Computational Linguistics: ACL 2025)

Large Language Models (LLMs) excel at text summarization, a task that requires models to select content based on its importance. However, the exact notion of salience that LLMs have internalized remains unclear. To bridge this gap, we introduce an explainable framework to systematically derive and investigate information salience in LLMs through their summarization behavior. Using length-controlled summarization as a behavioral probe into the content selection process, and tracing the answerability of Questions Under Discussion throughout, we derive a proxy for how models prioritize information. Our experiments on 13 models across four datasets reveal that LLMs have a nuanced, hierarchical notion of salience, generally consistent across model families and sizes. While models show highly consistent behavior and hence salience patterns, this notion of salience cannot be accessed through introspection, and only weakly correlates with human perceptions of information salience.
more » « less
Full Text Available
SPRI: Aligning Large Language Models with Context-Situated Principles

Zhan, Hongli; Azmat, Muneeza; Horesh, Raya; Li, Junyi Jessy; Yurochkin, Mikhail (July 2025, Proceedings of the Forty-Second International Conference on Machine Learning (ICML 2025))

Aligning Large Language Models to integrate and reflect human values, especially for tasks that demand intricate human oversight, is arduous since it is resource-intensive and time-consuming to depend on human expertise for context-specific guidance. Prior work has utilized predefined sets of rules or principles to steer the behavior of models (Bai et al., 2022; Sun et al., 2023). However, these principles tend to be generic, making it challenging to adapt them to each individual input query or context. In this work, we present Situated-PRInciples (SPRI), a framework requiring minimal or no human effort that is designed to automatically generate guiding principles in real-time for each input query and utilize them to align each response. We evaluate SPRI on three tasks, and show that 1) SPRI can derive principles in a complex domain-specific task that leads to on-par performance as expert-crafted ones; 2) SPRI-generated principles lead to instance-specific rubrics that outperform prior LLM-as-a-judge frameworks; 3) using SPRI to generate synthetic SFT data leads to substantial improvement on truthfulness.
more » « less
Full Text Available
A Tool for Generating Exceptional Behavior Tests With Large Language Models

https://doi.org/10.1145/3696630.3728608

Zhong, Linghan; Yuan, Samuel; Zhang, Jiyang; Liu, Yu; Nie, Pengyu; Li, Junyi Jessy; Gligoric, Milos (June 2025, ACM)

Full Text Available
exLong: Generating Exceptional Behavior Tests with Large Language Models

Zhang, Jiyang; Liu, Yu; Nie, Pengyu; Li, Junyi Jessy; Gligoric, Milos (April 2025, International Conference on Software Engineering)

Full Text Available
Is It JUST Semantics? A Case Study of Discourse Particle Understanding in LLMs

https://doi.org/10.18653/v1/2025.findings-acl.1117

Sheffield, William Berkeley; Misra, Kanishka; Pyatkin, Valentina; Deo, Ashwini; Mahowald, Kyle; Li, Junyi Jessy (January 2025, indings of the Association for Computational Linguistics: ACL 2025)

Discourse particles are crucial elements that subtly shape the meaning of text. These words, often polyfunctional, give rise to nuanced and often quite disparate semantic/discourse effects,as exemplified by the diverse uses of the particle *just* (e.g., exclusive, temporal, emphatic). This work investigates the capacity of LLMs to distinguish the fine-grained senses of English *just*, a well-studied example in formal semantics, using data meticulously created and labeled by expert linguists. Our findings reveal that while LLMs exhibit some ability to differentiate between broader categories, they struggle to fully capture more subtle nuances, highlighting a gap in their understanding of discourse particles.
more » « less
Full Text Available
Do *they* mean ‘us’? Interpreting Referring Expression variation under Intergroup Bias

https://doi.org/10.18653/v1/2024.findings-emnlp.571

Govindarajan, Venkata S; Zang, Matianyu; Mahowald, Kyle; Beaver, David; Li, Junyi Jessy (November 2024, Findings of the Association for Computational Linguistics: EMNLP 2024, Association for Computational Linguistics)

The variations between in-group and out-group speech (intergroup bias) are subtle and could underlie many social phenomena like stereotype perpetuation and implicit bias. In this paper, we model intergroup bias as a tagging task on English sports comments from forums dedicated to fandom for NFL teams. We curate a dataset of over 6 million game-time comments from opposing perspectives (the teams in the game), each comment grounded in a non-linguistic description of the events that precipitated these comments (live win probabilities for each team). Expert and crowd annotations justify modeling the bias through tagging of implicit and explicit referring expressions and reveal the rich, contextual understanding of language and the world required for this task. For large-scale analysis of intergroup variation, we use LLMs for automated tagging, and discover that LLMs occasionally perform better when prompted with linguistic descriptions of the win probability at the time of the comment, rather than numerical probability. Further, large-scale tagging of comments using LLMs uncovers linear variations in the form of referent across win probabilities that distinguish in-group and out-group utterances.
more » « less
Full Text Available
Large Language Models Produce Responses Perceived to be Empathic

https://doi.org/10.1109/ACII63134.2024.00012

Lee, Yoon Kyung; Suh, Jina; Zhan, Hongli; Li, Junyi Jessy; Ong, Desmond C (September 2024, IEEE)

Large Language Models (LLMs) have demonstrated surprising performance on many tasks, including writing supportive messages that display empathy. Here, we had these models generate empathic messages in response to posts describing common life experiences, such as workplace situations, parenting, relationships, and other anxiety- and anger-eliciting situations. Across two studies (N=192, 202), we showed human raters a variety of responses written by several models (GPT4 Turbo, Llama2, and Mistral), and had people rate these responses on how empathic they seemed to be. We found that LLM-generated responses were consistently rated as more empathic than human-written responses. Linguistic analyses also show that these models write in distinct, predictable “styles”, in terms of their use of punctuation, emojis, and certain words. These results highlight the potential of using LLMs to enhance human peer support in contexts where empathy is important.
more » « less
Full Text Available
Using Natural Language Explanations to Rescale Human Judgments

Wadhwa, Manya; Chen, Jifan; Li, Junyi Jessy; Durrett, Greg (July 2024, First Conference on Language Modeling (COLM))

Full Text Available
Language Models (Mostly) Do Not Consider Emotion Triggers When Predicting Emotion

Singh, Smriti; Caragea, Cornelia; Li, Junyi Jessy (June 2024, Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies)

Situations and events evoke emotions in humans, but to what extent do they inform the prediction of emotion detection models? This work investigates how well human-annotated emotion triggers correlate with features that models deemed salient in their prediction of emotions. First, we introduce a novel dataset EmoTrigger, consisting of 900 social media posts sourced from three different datasets; these were annotated by experts for emotion triggers with high agreement. Using EmoTrigger, we evaluate the ability of large language models (LLMs) to identify emotion triggers, and conduct a comparative analysis of the features considered important for these tasks between LLMs and fine-tuned models. Our analysis reveals that emotion triggers are largely not considered salient features for emotion prediction models, instead there is intricate interplay between various features and the task of emotion detection.
more » « less
Full Text Available

« Prev Next »

Search for: All records